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What is claimed is: 



< ^jjj^\l. A method for quantitatively/representing objects in a vector space, 

^xfmprising the steps of: 
5 identifying an object to be processed from a plurality of objects; 

extracting a feature corresponding to the object from the plurality of objects; 
converting the feature to/at least one vector; and 
associating the at lea^tone vector with the object. 

10 2. The method of claim 1, wherein the object to be processed comprises a 

subject document selected from a collection of documents. 



15 



3. The method of claim 2, wherein t£e feature comprises text surrounding the 
subject document in a host document. 

4. The method of claim 2, whe/ein the feature comprises text represented by 
the subject document. 



5. The method of claim 4f?wh&sin the converting step comprises the steps 
20 of: j I 

identifying each unique wo/d jvithin the text represented by all documents in the 
collection of documents; 

counting the occurrences/of each unique word in the subject document; and 
creating a vector having a number of dimensions equal to the number of unique 
25 words in the collection of documents, and further having as each element a numeric value 
representative of the number/of occurrences in the subject document of the corresponding 
word. 



6. The method of claim 5, wherein the value representative of the number of 
30 occurrences in the subject document of the corresponding word is calculated as the token 
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frequency weight of the corresponding word multiplied by the inverse context frequency 
weight of the corresponding word. 

7. The method of claim 2, wherein the feature comj&ises the subject 
Sooiment URL representing the subject document in the collection of documents. 

8. The method of claim 7, wherein the converting step comprises the steps 

of: 

identifying each unique word within the URLs representing all documents in the 
10 collection of documents; and jf 

counting the occurrences of each unique word in the subject document URL; 
creating a vector having a number of dimensions equal to the number of unique 
words in the URLs representing all documents in the collection of documents, and further 
having as each element a numeric value representative of the number of occurrences in 
15 the subject document URL of the corresponding word. 



9. The method of claim 8^wherein the value representative of the number of 
occurrences in the subject document/URL of the corresponding word is calculated as the 
token frequency weight of the corresponding word multiplied by the inverse context 
20 frequency weight of the corresponding word. 



10. The method^of claim 2, wherein the feature comprises inlinks in the 
collection of documents linking to the subject document. 

25 11. The method of claim 10, wherein the converting step comprises the steps 

of: 

identifying each document having links within the collection of documents; 
determin^g how many times each document having links points to the subject 
document; and/ 

30 creating a vector having a number of dimensions equal to the number of 

documents/having links in the collection of documents, and further having as each 



59 



Patent Application 
Attorney Docket No. D/99198 

element a numeric value representative of the number of links jn each corresponding 
document linking to the subject document. 

12. The method of claim 11, wherein the numeric/ value representative of the 
number of links in each corresponding document linking to the subject document is 
calculated as the token frequency weight of the corr^xmding link multiplied by the 
inverse context frequency weight of the corresponding 1 




13. The method of claim 10, wherein the converting step comprises the steps 
10 of: / 

identifying each document having hyperlinks within the collection of documents, 
and further identifying each unique word a^ociated with URLs defining hyperlinks in 
each document; 

counting the occurrences of each/unique word in the URLs defining hyperlinks 
15 pointing to the subject document; and 

creating a vector having a nuijaber of dimensions equal to the number of unique 
words associated with URLs defining hyperlinks within the collection of documents, and 
further having as each element /a numeric value representative of the number of 
occurrences in the URLs defining hyperlinks pointing to the subject document of the 
20 corresponding word. 



14. The method of claim 13, wherein the numeric value representative of the 
number of occurrences in tlie URLs defining hyperlinks pointing to the subject document 
of the corresponding word is calculated as the token frequency weight of the 

25 corresponding word multiplied by the inverse context frequency weight of the 
corresponding word. 

15. The method of claim 2, wherein the feature comprises outlinks in the 
subject document linking to other documents. 

30 



60 



Patent Application 
Attorney Docket No. D/99198 



16. The method of claim 15, wherein the converting step comprises the steps 

of: 

identifying each other document linked to by all documents wijfcin the collection 
of documents; and 

creating a vector having a number of dimensions equal to/the number of other 
documents linked to by documents in the collection of documents, and further having as 
each element a numeric value representative of the numb9r of links in the subject 
document linking to each corresponding other document. 



10 17. The method of claim 16, wherein the numeric value representative of the 

number of links in the subject document linking to each corresponding other document is 
calculated as the token frequency weight of the^corresponding link multiplied by the 
inverse context frequency weight of the corresponding link. 



15 18. The method of claim 15, wherein the converting step comprises the steps 

/ 

identifying each unique word associated with URLs defining hyperlinks in each 

document in the collection of documents; 

/ 

counting the occurrences of each unique word in the URLs defining hyperlinks in 
20 the subject document; and 

creating a vector having jf number of dimensions equal to the number of unique 
words associated with the URLs defining hyperlinks in each document, and further 
having as each element a numeric value representative of the number of occurrences in 
the URLs defining hyperlinks in the subject document of the corresponding word. 

25 
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19. The method of claim 18, wherein the numeric value representative of the 
number of occurrences^ the URLs defining hyperlinks in the subject document of the 
corresponding word/is calculated as the token frequency weight of the corresponding 
word multiplied b\r the inverse context frequency weight of the corresponding word. 
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20. The method of claim 2, wherein the feature comprises the/genre of the text 
represented by the subject document. 

21. The method of claim 20, wherein the converting st^p comprises the steps 

of: 



for each possible text genre, processing the subject document to calculate the 
probability that the subject document is of the corresponding genre; and 

creating a vector having a number of dimensions ecjual to the number of possible 
text genres, and further having as each element a numeric value representative of the 
10 probability that the subject document is of the corresponding genre. 



22. The method of claim 2, wherein the feature comprises the color histogram 
for an image represented by the subject document./ 



15 23. The method of claim 22, wherein the converting step comprises the steps 

/ 

quantizing the image represented by the subject document into a multi- 
dimensional color model; j 

creating a color histogram having a plurality of bins for each dimension in the 
20 color model, each bin corresponding to a unique combination of binary bits representing 
information from the associated dimension of the color model; 

counting each of a plurality of pixels from the image in a corresponding bin 

associated with each dimension of the color model; and 

/ 

creating a vector having^ number of dimensions equal to the total number of bins 
25 in the color histogram, and further having as each element a numeric value representative 
of the number of pixels in the^image corresponding to the corresponding histogram bin. 

j 

24. The method of claim 23, wherein the plurality of pixels from the image in 
the counting step comprises all of the pixels in the image. 

30 / 
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25. The method of claim 24, wherein the plurality of pixels from the image in 
the counting step comprises an approximately uniformly space^set of subsampled pixels 
from the image. 

5 26. The method of claim 23, wherein: 

the color model comprises a three-dimensionafl hue, saturation, and value color 
model; j 

each dimension of the color model is represented by two bits of information; and 
the color histogram has four bins for each dimension in the color model, for a 

10 total of twelve bins. 

27. The method of claim 23, \yherein the image represented by the subject 
document comprises a region of a bitmapy 



15 28. The method of claim 2, wherein the feature comprises the color 

complexity of an image represented by the subject document. 



29. The method of claim 28, wherein the converting step comprises the steps 

of: 

20 quantizing the image' represented by the subject document into a multi- 

dimensional color model; ^ 

determining the maximum number of pixels in any row in any image represented 
by a document in the collection of documents; 

determining the/maximum number of pixels in any column in any image 
25 represented by a document in the collection of documents; 

creating a horizontal complexity histogram and a vertical complexity histogram, 
each having a number of bins equal to the maximum number of pixels in any row and in 
any column, respectively; 

identifying horizontal runs of pixels of all possible lengths in the quantized image, 
30 and for each possible length, counting the number of pixels in a plurality of rows of the 
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quantized image belonging to the horizontal runs in a corresponding bin of the horizontal 
complexity histogram; 

identifying vertical runs of pixels of all possible lengths inyfhe quantized image, 
and for each possible length, counting the number of pixels in a/plurality of columns of 
5 the quantized image belonging to the vertical runs in a corresponding bin of the 
horizontal complexity histogram; 

creating a horizontal complexity vector having a number of dimensions equal to 
the maximum number of pixels in any row, and further having as each element a numeric 
value representing the number of pixels in the image in the corresponding horizontal 
10 histogram bin; and J 

creating a vertical complexity vector having a number of dimensions equal to the 
maximum number of pixels in any column, and/^uther having as each element a numeric 
value representing the number of pixels ir/the image in the corresponding vertical 
histogram bin. 

30. The method of claim 29/wherein the plurality of rows comprises all rows 
of the quantized image, and wherein/the plurality of columns comprises all columns of 
the quantized image. 



15 



20 31. The method of/claim 29, wherein the plurality of rows comprises an 

approximately uniformly spaced set of subsampled rows from the image, and wherein the 
plurality of columns comprises an approximately uniformly spaced set of subsampled 
columns from the image/ 



25 32. The method of claim 29, wherein: 

the color model comprises a three-dimensional hue, saturation, and value color 
model; and / 

each dimension of the color model is represented by two bits of information. 



30 33. / The method of claim 29, further comprising the step of concatenating the 

horizontal complexity vector and the vertical complexity vector to form a complexity 
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vector having a number of dimensions equal to the maximum dumber of pixels in any 
row plus the maximum number of pixels in any column. 

34. The method of claim 28, wherein the convening step comprises the steps 

5 of: 

quantizing the image represented by the subject document into a multi- 
dimensional color model; 

determining the maximum number of pixels/in any row in any image represented 
by a document in the collection of documents; 
10 determining the maximum number of /pixels in any column in any image 

represented by a document in the collection of documents; 

creating a horizontal complexity histogram and a vertical complexity histogram, 
each having a selected number of bins corresponding to a plurality of quantized ranges of 

run lengths; / 

/ 

15 identifying horizontal runs of pixels of all possible lengths in the quantized image, 

and for each possible length, counting Jhe number of pixels in a plurality of rows of the 

/ 

quantized image belonging to the horizontal runs in a corresponding bin of the horizontal 
complexity histogram; j 

identifying vertical runs oypixels of all possible lengths in the quantized image, 
20 and for each possible length, counting the number of pixels in a plurality of columns of 

the quantized image belonging' to the vertical runs in a corresponding bin of the 

/ 

horizontal complexity histogram; 

creating a horizontal complexity vector having a number of dimensions equal to 

/ 

the selected number of bins in the horizontal complexity histogram, and further having as 
25 each element a numeric value representing the number of pixels in the image in the 

corresponding horizontal histogram bin; and 

creating a vertical complexity vector having a number of dimensions equal to the 

number of bins in the vertical complexity histogram, and further having as each element a 

numeric value representing the number of pixels in the image in the corresponding 
30 vertical histogram bin. 
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35. The method of claim 64, wherein: 

a bin b x in the horizontal complexity histogram corresponding to a horizontal run 

of length r x is identified by a relationship b x - floor(r*(AM) / (nJ4)) + 1, where N is the 

selected number of bins in the horizontal complexity histogram and n x is a maximum 

number of pixels in any row of an image in the collection; and 

/ 

a bin b y in the vertical complexity histogram corresponding to a vertical run of 
length r y is identified by a relationship b y = floor(r^(AM) / (n/4)) + 1, where N is the 
selected number of bins in the horizontal complexity histogram and n y is a maximum 
number of pixels in any row of an image in the collection. 



36. The method of claim 34, wherein the plurality of rows comprises an 
I 

approximately uniformly spaced set of subsampled rows from the image, and wherein the 
plurality of columns comprises an approximately uniformly spaced set of subsampled 
columns from the image. / 

37. The method /of claim 34, wherein: 

the color model comprises a three-dimensional hue, saturation, and value color 
model; and 

each dimension of/the color model is represented by two bits of information. 

38. The method of claim 34, further comprising the step of concatenating the 
horizontal complexity vector and the vertical complexity vector to form a complexity 
vector having a number of dimensions equal to the selected number of bins in the 
horizontal complexity Mstogram plus the selected number of bins in the vertical 

25 complexity histogram. 

39. The method of claim 1, wherein the object to be processed comprises a 
subject user selected from a user population. 

30 40. The mei hod of claim 39, wherein the feature comprises the documents in a 

collection of documents accessed by the subject user. 
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calculating the number of times the subject user accessed each document in the 
collection of documents; and / 



41. The method of claim 40, Wherein the converting step comprises the steps 



identifying each unique document in the collection of documents; 




5 



creating a vector having 2! number of dimensions equal to the number of 
documents in the collection of documents, and further having as each element a numeric 
value representative of the number of times the subject user has accessed the 
10 corresponding document. / 

42. The method of /claim 41, wherein the value representative of the number 
of times the subject user has' accessed the corresponding document is calculated as the 
token frequency weight of tl/e corresponding document multiplied by the inverse context 
15 frequency weight of the corresponding document. 
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